Art of Assembly: Chapter Six-5

[Chapter Six][Previous] [Next] [Art of Assembly][Randall Hyde]

Art of Assembly: Chapter Six

6.9 - Program Flow Control Instructions
6.9.1 - Unconditional Jumps
6.9.2 - The CALL and RET Instructions
6.9.3 - The INT, INTO, BOUND, and IRET Instructions
6.9.4 - The Conditional Jump Instructions
6.9.5 - The JCXZ/JECXZ Instructions
6.9.6 - The LOOP Instruction
6.9.7 - The LOOPE/LOOPZ Instruction
6.9.8 - The LOOPNE/LOOPNZ Instruction
6.10 - Miscellaneous Instructions

6.9 Program Flow Control Instructions

The instructions discussed thus far execute sequentially; that is, the CPU executes each instruction in the sequence it appears in your program. To write real programs requires several control structures, not just the sequence. Examples include the if statement, loops, and subroutine invocation (a call). Since compilers reduce all other languages to assembly language, it should come as no surprise that assembly language supports the instructions necessary to implement these control structures. 80x86 program control instructions belong to three groups: unconditional transfers, conditional transfers, and subroutine call and return instructions. The following sections describe these instructions:

6.9.1 Unconditional Jumps

The jmp (jump) instruction unconditionally transfers control to another point in the program. There are six forms of this instruction: an intersegment/direct jump, two intrasegment/direct jumps, an intersegment/indirect jump, and two intrasegment/indirect jumps. Intrasegment jumps are always between statements in the same code segment. Intersegment jumps can transfer control to a statement in a different code segment.

These instructions generally use the same syntax, it is

	jmp	target

The assembler differentiates them by their operands:

        jmp     disp8   ;direct intrasegment, 8 bit displacement.
        jmp     disp16  ;direct intrasegment, 16 bit displacement.
        jmp     adrs32  ;direct intersegment, 32 bit segmented address.
        jmp     mem16   ;indirect intrasegment, 16 bit memory operand.
        jmp     reg16   ;register indirect intrasegment.
        jmp     mem32   ;indirect intersegment, 32 bit memory operand.

Intersegment is a synonym for far, intrasegment is a synonym for near.

The two direct intrasegment jumps differ only in their length. The first form consists of an opcode and a single byte displacement. The CPU sign extends this displacement to 16 bits and adds it to the ip register. This instruction can branch to a location -128..+127 from the beginning of the next instruction following it (i.e., -126..+129 bytes around the current instruction).

The second form of the intrasegment jump is three bytes long with a two byte displacement. This instruction allows an effective range of -32,768..+32,767 bytes and can transfer control to anywhere in the current code segment. The CPU simply adds the two byte displacement to the ip register.

These first two jumps use a relative addressing scheme. The offset encoded as part of the opcode byte is not the target address in the current code segment, but the distance to the target address. Fortunately, MASM will compute the distance for you automatically, so you do not have to compute this displacement value yourself. In many respects, these instructions are really nothing more than add ip, disp instructions.

The direct intersegment jump is five bytes long, the last four bytes containing a segmented address (the offset in the second and third bytes, the segment in the fourth and fifth bytes). This instruction copies the offset into the ip register and the segment into the cs register. Execution of the next instruction continues at the new address in cs:ip. Unlike the previous two jumps, the address following the opcode is the absolute memory address of the target instruction; this version does not use relative addressing. This instruction loads cs:ip with a 32 bit immediate value.

For the three direct jumps described above, you normally specify the target address using a statement label. A statement label is usually an identifier followed by a colon, usually on the same line as an executable machine instruction. The assembler determines the offset of the statement after the label and automatically computes the distance from the jump instruction to the statement label. Therefore, you do not have to worry about computing displacements manually. For example, the following short little loop continuously reads the parallel printer data port and inverts the L.O. bit. This produces a square wave electrical signal on one of the printer port output lines:

                mov     dx, 378h        ;Parallel printer port address.
LoopForever:    in      al, dx          ;Read character from input port.
                xor     al, 1           ;Invert the L.O. bit.
                out     dx, al          ;Output data back to port.
                jmp     LoopForever     ;Repeat forever.

The fourth form of the unconditional jump instruction is the indirect intrasegment jump instruction. It requires a 16 bit memory operand. This form transfers control to the address within the offset given by the two bytes of the memory operand. For example,

WordVar         word    TargetAddress
                 .
                 .
                 .
                jmp     WordVar

transfers control to the address specified by the value in the 16 bit memory location WordVar. This does not jump to the statement at address WordVar, it jumps to the statement at the address held in the WordVar variable. Note that this form of the jmp instruction is roughly equivalent to:

		mov	ip, WordVar

Although the example above uses a single word variable containing the indirect address, you can use any valid memory address mode, not just the displacement only addressing mode. You can use memory indirect addressing modes like the following:

                jmp     DispOnly        ;Word variable
                jmp     Disp[bx]        ;Disp is an array of words
                jmp     Disp[bx][si]
                jmp     [bx]
                etc.

Consider the indexed addressing mode above for a moment (disp[bx]). This addressing mode fetches the word from location disp+bx and copies this value to the ip register; this lets you create an array of pointers and jump to a specified pointer using an array index. Consider the following example:

AdrsArray       word    stmt1, stmt2, stmt3, stmt4
                 .
                 .
                 .
                mov     bx, I           ;I is in the range 0..3
                add     bx, bx          ;Index into an array of words.
                jmp     AdrsArray[bx]   ;Jump to stmt1, stmt2, etc., depending
                                        ; on the value of I.

The important thing to remember is that the near indirect jump fetches a word from memory and copies it into the ip register; it does not jump to the memory location specified, it jumps indirectly through the 16 bit pointer at the specified memory location.

The fifth jmp instruction transfers control to the offset given in a 16 bit general purpose register. Note that you can use any general purpose register, not just bx, si, di, or bp. An instruction of the form

		jmp	ax

is roughly equivalent to

		mov	ip, ax

Note that the previous two forms (register or memory indirect) are really the same instruction. The mod and r/m fields of a mod-reg-r/m byte specify a register or memory indirect address. See Appendix D for the details.

The sixth form of the jmp instruction, the indirect intersegment jump, has a memory operand that contains a double word pointer. The CPU copies the double word at that address into the cs:ip register pair. For example,

FarPointer      dword   TargetAddress
                 .
                 .
                 .
                jmp     FarPointer

transfers control to the segmented address specified by the four bytes at address FarPointer. This instruction is semantically identical to the (mythical) instruction

		lcs ip, FarPointer			;load cs, ip from FarPointer

As for the near indirect jump described earlier, this far indirect jump lets you specify any arbitrary (valid) memory addressing mode. You are not limited to the displacement only addressing mode the example above uses.

MASM uses a near indirect or far indirect addressing mode depending upon the type of the memory location you specify. If the variable you specify is a word variable, MASM will automatically generate a near indirect jump; if the variable is a dword, MASM emits the opcode for a far indirect jump. Some forms of memory addressing, unfortunately, do not intrinsically specify a size. For example, [bx] is definitely a memory operand, but does bx point at a word variable or a double word variable? It could point at either. Therefore, MASM will reject a statement of the form:

		jmp	[bx]

MASM cannot tell whether this should be a near indirect or far indirect jump. To resolve the ambiguity, you will need to use a type coercion operator. Chapter Eight will fully describe type coercion operators, for now, just use one of the following two instructions for a near or far jump, respectively:

		jmp	word ptr [bx]
		jmp	dword ptr [bx]

The register indirect addressing modes are not the only ones that could be type ambiguous. You could also run into this problem with indexed and base plus index addressing modes:

		jmp	word ptr 5[bx]
		jmp	dword ptr 9[bx][si]

For more information on the type coercion operators, see Chapter Eight.

In theory, you could use the indirect jump instructions and the setcc instructions to conditionally transfer control to some given location. For example, the following code transfers control to iftrue if word variable X is equal to word variable Y. It transfers control to iffalse, otherwise.

JmpTbl          word    iffalse, iftrue
                 .
                 .
                 .
                mov     ax, X
                cmp     ax, Y
                sete    bl
                movzx   ebx, bl
                jmp     JmpTbl[ebx*2]

As you will soon see, there is a much better way to do this using the conditional jump instructions.

6.9.2 The CALL and RET Instructions

The call and ret instructions handle subroutine calls and returns. There are five different call instructions and six different forms of the return instruction:

        call    disp16  ;direct intrasegment, 16 bit relative.
        call    adrs32  ;direct intersegment, 32 bit segmented address.
        call    mem16   ;indirect intrasegment, 16 bit memory pointer.
        call    reg16   ;indirect intrasegment, 16 bit register pointer.
        call    mem32   ;indirect intersegment, 32 bit memory pointer.

        ret             ;near or far return
        retn            ;near return
        retf            ;far return
        ret     disp    ;near or far return and pop
        retn    disp    ;near return and pop
        retf    disp    ;far return and pop

The call instructions take the same forms as the jmp instructions except there is no short (two byte) intrasegment call.

The far call instruction does the following:

It pushes the cs register onto the stack.
It pushes the 16 bit offset of the next instruction following the call onto the stack.
It copies the 32 bit effective address into the cs:ip registers. Since the call instruction allows the same addressing modes as jmp, call can obtain the target address using a relative, memory, or register addressing mode.
Execution continues at the first instruction of the subroutine. This first instruction is the opcode at the target address computed in the previous step.

The near call instruction does the following:

It pushes the 16 bit offset of the next instruction following the call onto the stack.
It copies the 16 bit effective address into the ip register. Since the call instruction allows the same addressing modes as jmp, call can obtain the target address using a relative, memory, or register addressing mode.
Execution continues at the first instruction of the subroutine. This first instruction is the opcode at the target address computed in the previous step.

The call disp16 instruction uses relative addressing. You can compute the effective address of the target by adding this 16 bit displacement with the return address (like the relative jmp instructions, the displacement is the distance from the instruction following the call to the target address).

The call adrs32 instruction uses the direct addressing mode. A 32 bit segmented address immediately follows the call opcode. This form of the call instruction copies that value directly into the cs:ip register pair. In many respects, this is equivalent to the immediate addressing mode since the value this instruction copies into the cs:ip register pair immediately follows the instruction.

Call mem16 uses the memory indirect addressing mode. Like the jmp instruction, this form of the call instruction fetches the word at the specified memory location and uses that word's value as the target address. Remember, you can use any memory addressing mode with this instruction. The displacement-only addressing mode is the most common form, but the others are just as valid:

	call	CallTbl[bx]	;Index into an array of pointers.
	call	word ptr [bx]	;BX points at word to use.
	call	WordTbl[bx][si]	; etc.

Note that the selection of addressing mode only affects the effective address computation for the target subroutine. These call instructions still push the offset of the next instruction following the call onto the stack. Since these are near calls (they obtain their target address from a 16 bit memory location), they all push a 16 bit return address onto the stack.

Call reg16 works just like the memory indirect call above, except it uses the 16 bit value in a register for the target address. This instruction is really the same instruction as the call mem16 instruction. Both forms specify their effective address using a mod-reg-r/m byte. For the call reg16 form, the mod bits contain 11b so the r/m field specifies a register rather than a memory addressing mode. Of course, this instruction also pushes the 16 bit offset of the next instruction onto the stack as the return address.

The call mem32 instruction is a far indirect call. The memory address specified by this instruction must be a double word value. This form of the call instruction fetches the 32 bit segmented address at the computed effective address and copies this double word value into the cs:ip register pair. This instruction also copies the 32 bit segmented address of the next instruction onto the stack (it pushes the segment value first and the offset portion second). Like the call mem16 instruction, you can use any valid memory addressing mode with this instruction:

                call    DWordVar
                call    DwordTbl[bx]
                call    dword ptr [bx]
                etc.

It is relatively easy to synthesize the call instruction using two or three other 80x86 instructions. You could create the equivalent of a near call using a push and a jmp instruction:

		push	<offset of instruction after jmp>
		jmp	subroutine

A far call would be similar, you'd need to add a

push
cs

instruction before the two instructions above to push a far return address on the stack.

The ret (return) instruction returns control to the caller of a subroutine. It does so by popping the return address off the stack and transferring control to the instruction at this return address. Intrasegment (near) returns pop a 16 bit return address off the stack into the ip register. An intersegment (far) return pops a 16 bit offset into the ip register and then a 16 bit segment value into the cs register. These instructions are effectively equal to the following:

retn:		pop	ip
retf:		popd	cs:ip

Clearly, you must match a near subroutine call with a near return and a far subroutine call with a corresponding far return. If you mix near calls with far returns or vice versa, you will leave the stack in an inconsistent state and you probably will not return to the proper instruction after the call. Of course, another important issue when using the call and ret instructions is that you must make sure your subroutine doesn't push something onto the stack and then fail to pop it off before trying to return to the caller. Stack problems are a major cause of errors in assembly language subroutines. Consider the following code:

Subroutine:     push    ax
                push    bx
                 .
                 .
                 .
                pop     bx
                ret
                 .
                 .
                 .
                call    Subroutine

The call instruction pushes the return address onto the stack and then transfers control to the first instruction of subroutine. The first two push instructions push the ax and bx registers onto the stack, presumably in order to preserve their value because subroutine modifies them. Unfortunately, a programming error exists in the code above, subroutine only pops bx from the stack, it fails to pop ax as well. This means that when subroutine tries to return to the caller, the value of ax rather than the return address is sitting on the top of the stack. Therefore, this subroutine returns control to the address specified by the initial value of the ax register rather than to the true return address. Since there are 65,536 different values ax can have, there is a 1/65,536th of a chance that your code will return to the real return address. The odds are not in your favor! Most likely, code like this will hang up the machine. Moral of the story - always make sure the return address is sitting on the stack before executing the return instruction.

Like the call instruction, it is very easy to simulate the ret instruction using two 80x86 instructions. All you need to do is pop the return address off the stack and then copy it into the ip register. For near returns, this is a very simple operation, just pop the near return address off the stack and then jump indirectly through that register:

		pop	ax
		jmp	ax

Simulating a far return is a little more difficult because you must load cs:ip in a single operation. The only instruction that does this (other than a far return) is the jmp mem32 instruction. See the exercises at the end of this chapter for more details.

There are two other forms of the ret instruction. They are identical to those above except a 16 bit displacement follows their opcodes. The CPU adds this value to the stack pointer immediately after popping the return address from the stack. This mechanism removes parameters pushed onto the stack before returning to the caller. See Chapter Eleven for more details.

The assembler allows you to type ret without the "f" or "n" suffix. If you do so, the assembler will figure out whether it should generate a near return or a far return. See the chapter on procedures and functions for details on this.

6.9.3 The INT, INTO, BOUND, and IRET Instructions

The int (for software interrupt) instruction is a very special form of a call instruction. Whereas the call instruction calls subroutines within your program, the int instruction calls system routines and other special subroutines. The major difference between interrupt service routines and standard procedures is that you can have any number of different procedures in an assembly language program, while the system supports a maximum of 256 different interrupt service routines. A program calls a subroutine by specifying the address of that subroutine; it calls an interrupt service routine by specifying the interrupt number for that particular interrupt service routine. This chapter will only describe how to call an interrupt service routine using the int, into, and bound instructions, and how to return from an interrupt service routine using the iret instruction.

There are four different forms of the int instruction. The first form is

		int	nn

(where "nn" is a value between 0 and 255). It allows you to call one of 256 different interrupt routines. This form of the int instruction is two bytes long. The first byte is the int opcode. The second byte is immediate data containing the interrupt number.

Although you can use the int instruction to call procedures (interrupt service routines) you've written, the primary purpose of this instruction is to make a system call. A system call is a subroutine call to a procedure provided by the system, such as a DOS , PC-BIOS, mouse, or some other piece of software resident in the machine before your program began execution. Since you always refer to a specific system call by its interrupt number, rather than its address, your program does not need to know the actual address of the subroutine in memory. The int instruction provides dynamic linking to your program. The CPU determines the actual address of an interrupt service routine at run time by looking up the address in an interrupt vector table. This allows the authors of such system routines to change their code (including the entry point) without fear of breaking any older programs that call their interrupt service routines. As long as the system call uses the same interrupt number, the CPU will automatically call the interrupt service routine at its new address.

The only problem with the int instruction is that it supports only 256 different interrupt service routines. MS-DOS alone supports well over 100 different calls. BIOS and other system utilities provide thousands more. This is above and beyond all the interrupts reserved by Intel for hardware interrupts and traps. The common solution most of the system calls use is to employ a single interrupt number for a given class of calls and then pass a function number in one of the 80x86 registers (typically the ah register). For example, MS-DOS uses only a single interrupt number, 21h. To choose a particular DOS function, you load a DOS function code into the ah register before executing the

int 21h

instruction. For example, to terminate a program and return control to MS-DOS, you would normally load ah with 4Ch and call DOS with the

int
21h

instruction:

		mov   ah, 4ch  ;DOS terminate opcode.
		int   21h      ;DOS call

The BIOS keyboard interrupt is another good example. Interrupt 16h is responsible for testing the keyboard and reading data from the keyboard. This BIOS routine provides several calls to read a character and scan code from the keyboard, see if any keys are available in the system type ahead buffer, check the status of the keyboard modifier flags, and so on. To choose a particular operation, you load the function number into the ah register before executing int 16h. The following table lists the possible functions:


BIOS Keyboard Support Functions
Function #

(AH) Input

Parameters Output

Parameters Description

0 - al- ASCII character

ah- scan code Read character. Reads next available character from the system's type ahead buffer. Wait for a keystroke if the buffer is empty.

1 - ZF- Set if no key.

ZF- Clear if key available.

al- ASCII code

ah- scan code Checks to see if a character is available in the type ahead buffer. Sets the zero flag if not key is available, clears the zero flag if a key is available. If there is an available key, this function returns the ASCII and scan code value in ax. The value in ax is undefined if no key is available.

2 - al- shift flags Returns the current status of the shift flags in al. The shift flags are defined as follows:

 

bit 7: Insert toggle

bit 6: Capslock toggle

bit 5: Numlock toggle

bit 4: Scroll lock toggle

bit 3: Alt key is down

bit 2: Ctrl key is down

bit 1: Left shift key is down

bit 0: Right shift key is down

3 al = 5

bh = 0, 1, 2, 3 for 1/4, 1/2, 3/4, or 1 second delay

bl= 0..1Fh for 30/sec to 2/sec. - Set auto repeat rate. The bh register contains the amount of time to wait before starting the autorepeat operation, the bl register contains the autorepeat rate.

5 ch = scan code

cl = ASCII code - Store keycode in buffer. This function stores the value in the cx register at the end of the type ahead buffer. Note that the scan code in ch doesn't have to correspond to the ASCII code appearing in cl. This routine will simply insert the data you provide into the system type ahead buffer.

10h - al- ASCII character

ah- scan code Read extended character. Like ah=0 call, except this one passes all key codes, the ah=0 call throws away codes that are not PC/XT compatible.

11h - ZF- Set if no key.

ZF- Clear if key available.

al- ASCII code

ah- scan code Like the ah=01h call except this one does not throw away keycodes that are not PC/XT compatible (i.e., the extra keys found on the 101 key keyboard).

12h - al- shift flags

ah- extended shift flags Returns the current status of the shift flags in ax. The shift flags are defined as follows:

 

bit 15: SysReq key pressed

bit 14: Capslock key currently down

bit 13: Numlock key currently down

bit 12: Scroll lock key currently down

bit 11: Right alt key is down

bit 10:Right ctrl key is down

bit 9: Left alt key is down

bit 8: Left ctrl key is down

bit 7: Insert toggle

bit 6: Capslock toggle

bit 5: Numlock toggle

bit 4: Scroll lock toggle

bit 3: Either alt key is down (some machines, left only)

bit 2: Either ctrl key is down

bit 1: Left shift key is down

bit 0: Right shift key is down

BIOS Keyboard Support Functions
Function # (AH)	Input Parameters	Output Parameters	Description
0	-	`al`- ASCII character `ah`- scan code	Read character. Reads next available character from the system's type ahead buffer. Wait for a keystroke if the buffer is empty.
1	-	ZF- Set if no key. ZF- Clear if key available. `al`- ASCII code `ah`- scan code	Checks to see if a character is available in the type ahead buffer. Sets the zero flag if not key is available, clears the zero flag if a key is available. If there is an available key, this function returns the ASCII and scan code value in `ax`. The value in `ax` is undefined if no key is available.
2	-	al- shift flags	Returns the current status of the shift flags in al. The shift flags are defined as follows: bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Alt key is down bit 2: Ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down
3	`al` = 5 `bh` = 0, 1, 2, 3 for 1/4, 1/2, 3/4, or 1 second delay `bl`= 0..1Fh for 30/sec to 2/sec.	-	Set auto repeat rate. The `bh` register contains the amount of time to wait before starting the autorepeat operation, the `bl` register contains the autorepeat rate.
5	`ch` = scan code `cl` = ASCII code	-	Store keycode in buffer. This function stores the value in the `cx` register at the end of the type ahead buffer. Note that the scan code in `ch` doesn't have to correspond to the ASCII code appearing in `cl`. This routine will simply insert the data you provide into the system type ahead buffer.
10h	-	`al`- ASCII character `ah`- scan code	Read extended character. Like `ah`=0 call, except this one passes all key codes, the `ah`=0 call throws away codes that are not PC/XT compatible.
11h	-	ZF- Set if no key. ZF- Clear if key available. `al`- ASCII code `ah`- scan code	Like the ah=01h call except this one does not throw away keycodes that are not PC/XT compatible (i.e., the extra keys found on the 101 key keyboard).
12h	-	al- shift flags ah- extended shift flags	Returns the current status of the shift flags in ax. The shift flags are defined as follows: bit 15: SysReq key pressed bit 14: Capslock key currently down bit 13: Numlock key currently down bit 12: Scroll lock key currently down bit 11: Right alt key is down bit 10:Right ctrl key is down bit 9: Left alt key is down bit 8: Left ctrl key is down bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Either alt key is down (some machines, left only) bit 2: Either ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down

For example, to read a character from the system type ahead buffer, leaving the ASCII code in al, you could use the following code:

        mov     ah, 0           ;Wait for key available, and then
        int     16h             ; read that key.
        mov     character, al   ;Save character read.

Likewise, if you wanted to test the type ahead buffer to see if a key is available, without reading that keystroke, you could use the following code:

        mov     ah, 1   ;Test to see if key is available.
        int     16h     ;Sets the zero flag if a key is not
                        ; available.

The second form of the int instruction is a special case:

		int	 3

Int 3 is a special form of the interrupt instruction that is only one byte long. CodeView and other debuggers use it as a software breakpoint instruction. Whenever you set a breakpoint on an instruction in your program, the debugger will typically replace the first byte of the instruction's opcode with an int 3 instruction. When your program executes the int 3 instruction, this makes a "system call" to the debugger so the debugger can regain control of the CPU. When this happens, the debugger will replace the int 3 instruction with the original opcode.

While operating inside a debugger, you can explicitly use the

int
3

instruction to stop program executing and return control to the debugger. This is not, however, the normal way to terminate a program. If you attempt to execute an int 3 instruction while running under DOS, rather than under the control of a debugger program, you will likely crash the system.

The third form of the int instruction is into. Into will cause a software breakpoint if the 80x86 overflow flag is set. You can use this instruction to quickly test for arithmetic overflow after executing an arithmetic instruction. Semantically, this instruction is equivalent to

		if overflow = 1 then int 4

You should not use this instruction unless you've supplied a corresponding trap handler (interrupt service routine). Doing so would probably crash the system. .

The fourth software interrupt, provided by 80286 and later processors, is the bound instruction. This instruction takes the form

		bound	reg, mem

and executes the following algorithm:

	if (reg < [mem]) or (reg > [mem+sizeof(reg)]) then int 5

[mem] denotes the contents of the memory location mem and sizeof(reg) is two or four depending on whether the register is 16 or 32 bits wide. The memory operand must be twice the size of the register operand. The bound instruction compares the values using a signed integer comparison.

Intel's designers added the bound instruction to allow a quick check of the range of a value in a register. This is useful in Pascal, for example, which checking array bounds validity and when checking to see if a subrange integer is within an allowable range. There are two problems with this instruction, however. On 80486 and Pentium/586 processors, the bound instruction is generally slower than the sequence of instructions it would replace:

                cmp     reg, LowerBound
                jl      OutOfBounds
                cmp     reg, UpperBound
                jg      OutOfBounds

On the 80486 and Pentium/586 chips, the sequence above only requires four clock cycles assuming you can use the immediate addressing mode and the branches are not taken; the bound instruction requires 7-8 clock cycles under similar circumstances and also assuming the memory operands are in the cache.

A second problem with the bound instruction is that it executes an int 5 if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5 interrupt handler routine to print the screen. Therefore, if you execute a bound instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default

int
5

handler with one of your own, pressing the PrtSc key will transfer control to your bound instruction handler. Although there are ways around this problem, most people don't bother since the bound instruction is so slow.

Whatever int instruction you execute, the following sequence of events follows:

The 80x86 pushes the flags register onto the stack;
The 80x86 pushes cs and then ip onto the stack;
The 80x86 uses the interrupt number (into is interrupt #4, bound is interrupt #5) times four as an index into the interrupt vector table and copies the double word at that point in the table into cs:ip.

The int instructions vary from a call in two major ways. First, call instructions vary in length from two to six bytes long, whereas int instructions are generally two bytes long (int 3, into, and bound are the exceptions). Second, and most important, the int instruction pushes the flags and the return address onto the stack while the call instruction pushes only the return address. Note also that the int instructions always push a far return address (i.e., a cs value and an offset within the code segment), only the far call pushes this double word return address.

Since int pushes the flags onto the stack you must use a special return instruction, iret (interrupt return), to return from a routine called via the int instructions. If you return from an interrupt procedure using the ret instruction, the flags will be left on the stack upon returning to the caller. The iret instruction is equivalent to the two instruction sequence: ret, popf (assuming, of course, that you execute popf before returning control to the address pointed at by the double word on the top of the stack).

The int instructions clear the trace (T) flag in the flags register. They do not affect any other flags. The iret instruction, by its very nature, can affect all the flags since it pops the flags from the stack.

6.9.4 The Conditional Jump Instructions

Although the jmp, call, and ret instructions provide transfer of control, they do not allow you to make any serious decisions. The 80x86's conditional jump instructions handle this task. The conditional jump instructions are the basic tool for creating loops and other conditionally executable statements like the if..then statement.

The conditional jumps test one or more flags in the flags register to see if they match some particular pattern (just like the setcc instructions). If the pattern matches, control transfers to the target location. If the match fails, the CPU ignores the conditional jump and execution continues with the next instruction. Some instructions, for example, test the conditions of the sign, carry, overflow, and zero flags. For example, after the execution of a shift left instruction, you could test the carry flag to determine if it shifted a one out of the H.O. bit of its operand. Likewise, you could test the condition of the zero flag after a test instruction to see if any specified bits were one. Most of the time, however, you will probably execute a conditional jump after a cmp instruction. The cmp instruction sets the flags so that you can test for less than, greater than, equality, etc.

Note: Intel's documentation defines various synonyms or instruction aliases for many conditional jump instructions. The following tables list all the aliases for a particular instruction. These tables also list out the opposite branches. You'll soon see the purpose of the opposite branches.

Jcc Instructions That Test Flags
Instruction	Description	Condition	Aliases	Opposite
JC	Jump if carry	Carry = 1	JB, JNAE	JNC
JNC	Jump if no carry	Carry = 0	JNB, JAE	JC
JZ	Jump if zero	Zero = 1	JE	JNZ
JNZ	Jump if not zero	Zero = 0	JNE	JZ
JS	Jump if sign	Sign = 1	-	JNS
JNS	Jump if no sign	Sign = 0	-	JS
JO	Jump if overflow	Ovrflw=1	-	JNO
JNO	Jump if no Ovrflw	Ovrflw=0	-	JO
JP	Jump if parity	Parity = 1	JPE	JNP
JPE	Jump if parity even	Parity = 1	JP	JPO
JNP	Jump if no parity	Parity = 0	JPO	JP
JPO	Jump if parity odd	Parity = 0	JNP	JPE


Jcc Instructions for Unsigned Comparisons
Instruction Description Condition Aliases Opposite

JA Jump if above (>) Carry=0, Zero=0 JNBE JNA

JNBE Jump if not below or equal (not <=) Carry=0, Zero=0 JA JBE

JAE Jump if above or equal (>=) Carry = 0 JNC, JNB JNAE

JNB Jump if not below (not <) Carry = 0 JNC, JAE JB

JB Jump if below (<) Carry = 1 JC, JNAE JNB

JNAE Jump if not above or equal (not >=) Carry = 1 JC, JB JAE

JBE Jump if below or equal (<=) Carry = 1 or 

Zero = 1 JNA JNBE

JNA Jump if not above 

(not >) Carry = 1 or

Zero = 1 JBE JA

JE Jump if equal (=) Zero = 1 JZ JNE

JNE Jump if not equal () Zero = 0 JNZ JE



Jcc Instructions for Signed Comparisons
Instruction Description Condition Aliases Opposite

JG Jump if greater (>) Sign = Ovrflw or Zero=0 JNLE JNG

JNLE Jump if not less than or equal (not <=) Sign = Ovrflw or Zero=0

  JG JLE

JGE Jump if greater than or equal (>=) Sign = Ovrflw JNL JGE

JNL Jump if not less than (not <) Sign = Ovrflw JGE JL

JL Jump if less than (<) Sign  Ovrflw JNGE JNL

JNGE Jump if not greater or equal (not >=) Sign  Ovrflw

  JL JGE

JLE Jump if less than or equal (<=) Sign  Ovrflw or 

Zero = 1 JNG JNLE

JNG Jump if not greater than (not >) Sign  Ovrflw or

Zero = 1 JLE JG

JE Jump if equal (=) Zero = 1 JZ JNE

JNE Jump if not equal () Zero = 0 JNZ JE

Jcc Instructions for Unsigned Comparisons
Instruction	Description	Condition	Aliases	Opposite
JA	Jump if above (>)	Carry=0, Zero=0	JNBE	JNA
JNBE	Jump if not below or equal (not <=)	Carry=0, Zero=0	JA	JBE
JAE	Jump if above or equal (>=)	Carry = 0	JNC, JNB	JNAE
JNB	Jump if not below (not <)	Carry = 0	JNC, JAE	JB
JB	Jump if below (<)	Carry = 1	JC, JNAE	JNB
JNAE	Jump if not above or equal (not >=)	Carry = 1	JC, JB	JAE
JBE	Jump if below or equal (<=)	Carry = 1 or Zero = 1	JNA	JNBE
JNA	Jump if not above (not >)	Carry = 1 or Zero = 1	JBE	JA
JE	Jump if equal (=)	Zero = 1	JZ	JNE
JNE	Jump if not equal ()	Zero = 0	JNZ	JE

Jcc Instructions for Signed Comparisons
Instruction	Description	Condition	Aliases	Opposite
JG	Jump if greater (>)	Sign = Ovrflw or Zero=0	JNLE	JNG
JNLE	Jump if not less than or equal (not <=)	Sign = Ovrflw or Zero=0	JG	JLE
JGE	Jump if greater than or equal (>=)	Sign = Ovrflw	JNL	JGE
JNL	Jump if not less than (not <)	Sign = Ovrflw	JGE	JL
JL	Jump if less than (<)	Sign Ovrflw	JNGE	JNL
JNGE	Jump if not greater or equal (not >=)	Sign Ovrflw	JL	JGE
JLE	Jump if less than or equal (<=)	Sign Ovrflw or Zero = 1	JNG	JNLE
JNG	Jump if not greater than (not >)	Sign Ovrflw or Zero = 1	JLE	JG
JE	Jump if equal (=)	Zero = 1	JZ	JNE
JNE	Jump if not equal ()	Zero = 0	JNZ	JE

On the 80286 and earlier, these instructions are all two bytes long. The first byte is a one byte opcode followed by a one byte displacement. Although this leads to very compact instructions, a single byte displacement only allows a range of ±128 bytes. There is a simple trick you can use to overcome this limitation on these earlier processors:

Whatever jump you're using, switch to its opposite form. (given in the tables above).
Once you've selected the opposite branch, use it to jump over a jmp instruction whose target address is the original target address.

For example, to convert:

		jc	Target

to the long form, use the following sequence of instructions:

		jnc	SkipJmp
 		jmp	Target
SkipJmp:

If the carry flag is clear (NC=no carry), then control transfers to label SkipJmp, at the same point you'd be if you were using the jc instruction above. If the carry flag is set when encountering this sequence, control will fall through the jnc instruction to the jmp instruction that will transfer control to Target. Since the jmp instruction allows 16 bit displacement and far operands, you can jump anywhere in the memory using this trick.

One brief comment about the "opposites" column is in order. As mentioned above, when you need to manually extend a branch from ±128 you should choose the opposite branch to branch around a jump to the target location. As you can see in the "aliases" column above, many conditional jump instructions have aliases. This means that there will be aliases for the opposite jumps as well. Do not use any aliases when extending branches that are out of range. With only two exceptions, a very simple rule completely describes how to generate an opposite branch:

If the second letter of the jcc instruction is not an "n", insert an "n" after the "j". E.g., je becomes jne and jl becomes jnl.
If the second letter of the jcc instruction is an "n", then remove that "n" from the instruction. E.g., jng becomes jg, jne becomes je.

The two exceptions to this rule are jpe (jump parity even) and jpo (jump parity odd). These exceptions cause few problems because (a) you'll hardly ever need to test the parity flag, and (b) you can use the aliases jp and jnp synonyms

for jpe and jpo. The "N/No N" rule applies to jp and jnp.

Though you know that jge is the opposite of jl, get in the habit of using jnl rather than jge. It's too easy in an important situation to start thinking "greater is the opposite of less" and substitute jg instead. You can avoid this confusion by always using the "N/No N" rule.

MASM 6.x and many other modern 80x86 assemblers will automatically convert out of range branches to this sequence for you. There is an option that will allow you to disable this feature. For performance critical code that runs on 80286 and earlier processors, you may want to disable this feature so you can fix the branches yourself. The reason is quite simple, this simple fix always wipes out the pipeline no matter which condition is true since the CPU jumps in either case. One thing nice about conditional jumps is that you do not flush the pipeline or the prefetch queue if you do not take the branch. If one condition is true far more often than the other, you might want to use the conditional jump to transfer control to a jmp nearby, so you can continue to fall through as before. For example, if you have a je target instruction and target is out of range, you could convert it to the following code:

                je      GotoTarget
                 .
                 .
                 .
GotoTarget:     jmp     Target

Although a branch to target now requires executing two jumps, this is much more efficient than the standard conversion if the zero flag is normally clear when executing the je instruction.

The 80386 and later processor provide an extended form of the conditional jump that is four bytes long, with the last two bytes containing a 16 bit displacement. These conditional jumps can transfer control anywhere within the current code segment. Therefore, there is no need to worry about manually extending the range of the jump. If you've told MASM you're using an 80386 or later processor, it will automatically choose the two byte or four byte form, as necessary. See Chapter Eight to learn how to tell MASM you're using an 80386 or later processor.

The 80x86 conditional jump instruction give you the ability to split program flow into one of two paths depending upon some logical condition. Suppose you want to increment the ax register if bx is or equal to cx. You can accomplish this with the following code:

                cmp     bx, cx
                jne     SkipStmts
                inc     ax
SkipStmts:

The trick is to use the opposite branch to skip over the instructions you want to execute if the condition is true. Always use the "opposite branch (N/no N)" rule given earlier to select the opposite branch. You can make the same mistake choosing an opposite branch here as you could when extending out of range jumps.

You can also use the conditional jump instructions to synthesize loops. For example, the following code sequence reads a sequence of characters from the user and stores each character in successive elements of an array until the user presses the Enter key (carriage return):

                mov     di, 0
ReadLnLoop:     mov     ah, 0           ;INT 16h read key opcode.
                int     16h
                mov     Input[di], al
                inc     di
                cmp     al, 0dh         ;Carriage return ASCII code.
                jne     ReadLnLoop
                mov     Input[di-1],0   ;Replace carriage return with zero.

For more information concerning the use of the conditional jumps to synthesize IF statements, loops, and other control structures, see Chapter Ten.

Like the setcc instructions, the conditional jump instructions come in two basic categories - those that test specific process flag values (e.g., jz, jc, jno) and those that test some condition ( less than, greater than, etc.). When testing a condition, the conditional jump instructions almost always follow a cmp instruction. The cmp instruction sets the flags so you can use a ja, jae, jb, jbe, je, or jne instruction to test for unsigned less than, less than or equal, equality, inequality, greater than, or greater than or equal. Simultaneously, the cmp instruction sets the flags so you can also do a signed comparison using the jl, jle, je, jne, jg, and jge instructions.

The conditional jump instructions only test flags, they do not affect any of the 80x86 flags.

6.9.5 The JCXZ/JECXZ Instructions

The jcxz (jump if cx is zero) instruction branches to the target address if cx contains zero. Although you can use it anytime you need to see if cx contains zero, you would normally use it before a loop you've constructed with the loop instructions. The loop instruction can repeat a sequence of operations cx times. If cx equals zero, loop will repeat the operation 65,536 times. You can use jcxz to skip over such a loop when cx is zero.

The jecxz instruction, available only on 80386 and later processors, does essentially the same job as jcxz except it tests the full ecx register. Note that the jcxz instruction only checks cx, even on an 80386 in 32 bit mode.

There are no "opposite" jcxz or jecxz instructions. Therefore, you cannot use "N/No N" rule to extend the jcxz and jecxz instructions. The easiest way to solve this problem is to break the instruction up into two instructions that accomplish the same task:

		jcxz	Target

becomes

		test	cx, cx		;Sets the zero flag if cx=0
		je	Target

Now you can easily extend the je instruction using the techniques from the previous section.

The test instruction above will set the zero flag if and only if cx contains zero. After all, if there are any non-zero bits in cx, logically anding them with themselves will produce a non-zero result. This is an efficient way to see if a 16 or 32 bit register contains zero. In fact, this two instruction sequence is faster than the jcxz instruction on the 80486 and later processors. Indeed, Intel recommends the use of this sequence rather than the jcxz instruction if you are concerned with speed. Of course, the jcxz instruction is shorter than the two instruction sequence, but it is not faster. This is a good example of an exception to the rule "shorter is usually faster."

The jcxz instruction does not affect any flags.

6.9.6 The LOOP Instruction

This instruction decrements the cx register and then branches to the target location if the cx register does not contain zero. Since this instruction decrements cx then checks for zero, if cx originally contained zero, any loop you create using the loop instruction will repeat 65,536 times. If you do not want to execute the loop when cx contains zero, use jcxz to skip over the loop.

There is no "opposite" form of the loop instruction, and like the jcxz/jecxz instructions the range is limited to ±128 bytes on all processors. If you want to extend the range of this instruction, you will need to break it down into discrete components:

; "loop lbl" becomes:

		dec	cx
		jne	lbl

You can easily extend this jne to any distance.

There is no eloop instruction that decrements ecx and branches if not zero (there is a loope instruction, but it does something else entirely). The reason is quite simple. As of the 80386, Intel's designers stopped wholeheartedly supporting the loop instruction. Oh, it's there to ensure compatibility with older code, but it turns out that the dec/jne instructions are actually faster on the 32 bit processors. Problems in the decoding of the instruction and the operation of the pipeline are responsible for this strange turn of events.

Although the loop instruction's name suggests that you would normally create loops with it, keep in mind that all it is really doing is decrementing cx and branching to the target address if cx does not contain zero after the decrement. You can use this instruction anywhere you want to decrement cx and then check for a zero result, not just when creating loops. Nonetheless, it is a very convenient instruction to use if you simply want to repeat a sequence of instructions some number of times. For example, the following loop initializes a 256 element array of bytes to the values 1, 2, 3, ...

                mov     ecx, 255
ArrayLp:        mov     Array[ecx], cl
                loop    ArrayLp
                mov     Array[0], 0

The last instruction is necessary because the loop does not repeat when cx is zero. Therefore, the last element of the array that this loop processes is Array[1], hence the last instruction.

The loop instruction does not affect any flags.

6.9.7 The LOOPE/LOOPZ Instruction

Loope/loopz (loop while equal/zero, they are synonyms for one another) will branch to the target address if cx is not zero and the zero flag is set. This instruction is quite useful after cmp or cmps instruction, and is marginally faster than the comparable 80386/486 instructions if you use all the features of this instruction. However, this instruction plays havoc with the pipeline and superscalar operation of the Pentium so you're probably better off sticking with discrete instructions rather than using this instruction. This instruction does the following:

	cx := cx - 1
	if ZeroFlag = 1 and cx  0, goto target

The loope instruction falls through on one of two conditions. Either the zero flag is clear or the instruction decremented cx to zero. By testing the zero flag after the loop instruction (with a je or jne instruction, for example), you can determine the cause of termination.

This instruction is useful if you need to repeat a loop while some value is equal to another, but there is a maximum number of iterations you want to allow. For example, the following loop scans through an array looking for the first non-zero byte, but it does not scan beyond the end of the array:

                mov     cx, 16          ;Max 16 array elements.
                mov     bx, -1          ;Index into the array (note next inc).
SearchLp:       inc     bx              ;Move on to next array element.
                cmp     Array[bx], 0    ;See if this element is zero.
                loope   SearchLp        ;Repeat if it is.
                je      AllZero         ;Jump if all elements were zero.

Note that this instruction is not the opposite of loopnz/loopne. If you need to extend this jump beyond ±128 bytes, you will need to synthesize this instruction using discrete instructions. For example, if loope target is out of range, you would need to use an instruction sequence like the following:

                jne     quit
                dec     cx
                je      Quit2
                jmp     Target
quit:           dec     cx      ;loope decrements cx, even if ZF=0.
quit2:

The loope/loopz instruction does not affect any flags.

6.9.8 The LOOPNE/LOOPNZ Instruction

This instruction is just like the loope/loopz instruction in the previous section except

loopne/loopnz

(loop while not equal/not zero) repeats while cx is not zero and the zero flag is clear. The algorithm is

	cx := cx - 1
	if ZeroFlag = 0 and cx  0, goto target

You can determine if the loopne instruction terminated because cx was zero or if the zero flag was set by testing the zero flag immediately after the loopne instruction. If the zero flag is clear at that point, the loopne instruction fell through because it decremented cx to zero. Otherwise it fell through because the zero flag was set.

This instruction is not the opposite of loope/loopz. If the target address is out of range, you will need to use an instruction sequence like the following:

                je      quit
                dec     cx
                je      Quit2
                jmp     Target
quit:           dec     cx      ;loopne decrements cx, even if ZF=1.
quit2:

You can use the loopne instruction to repeat some maximum number of times while waiting for some other condition to be true. For example, you could scan through an array until you exhaust the number of array elements or until you find a certain byte using a loop like the following:

                mov     cx, 16          ;Maximum # of array elements.
                mov     bx, -1          ;Index into array.
LoopWhlNot0:    inc     bx              ;Move on to next array element.
                cmp     Array[bx],0     ;Does this element contain zero?
                loopne  LoopWhlNot0     ;Quit if it does, or more than 16 bytes.

Although the loope/loopz and loopne/loopnz instructions are slower than the individual instruction from which they could be synthesized, there is one main use for these instruction forms where speed is rarely important; indeed, being faster would make them less useful - timeout loops during I/O operations. Suppose bit #7 of input port 379h contains a one if the device is busy and contains a zero if the device is not busy. If you want to output data to the port, you could use code like the following:

                mov     dx, 379h
WaitNotBusy:    in      al, dx          ;Get port
                test    al, 80h         ;See if bit #7 is one
                jne     WaitNotBusy     ;Wait for "not busy"

The only problem with this loop is that it is conceivable that it would loop forever. In a real system, a cable could come unplugged, someone could shut off the peripheral device, and any number of other things could go wrong that would hang up the system. Robust programs usually apply a timeout to a loop like this. If the device fails to become busy within some specified amount of time, then the loop exits and raises an error condition. The following code will accomplish this:

                mov     dx, 379h        ;Input port address
                mov     cx, 0           ;Loop 65,536 times and then quit.
WaitNotBusy:    in      al, dx          ;Get data at port.
                test    al, 80h         ;See if busy
                loopne  WaitNotBusy     ;Repeat if busy and no time out.
                jne     TimedOut        ;Branch if CX=0 because we timed out.

You could use the loope/loopz instruction if the bit were zero rather than one.

The loopne/loopnz instruction does not affect any flags.

6.10 Miscellaneous Instructions

There are various miscellaneous instructions on the 80x86 that don't fall into any category above. Generally these are instructions that manipulate individual flags, provide special processor services, or handle privileged mode operations.

There are several instructions that directly manipulate flags in the 80x86 flags register. They are

clc Clears the carry flag
stc Sets the carry flag
cmc Complements the carry flag
cld Clears the direction flag
std Sets the direction flag
cli Clears the interrupt enable/disable flag
sti Sets the interrupt enable/disable flag

Note: you should be careful when using the cli instruction in your programs. Improper use could lock up your machine until you cycle the power.

The nop instruction doesn't do anything except waste a few processor cycles and take up a byte of memory. Programmers often use it as a place holder or a debugging aid. As it turns out, this isn't a unique instruction, it's just a synonym for the xchg ax, ax instruction.

The hlt instruction halts the processor until a reset, non-maskable interrupt, or other interrupt (assuming interrupts are enabled) comes along. Generally, you shouldn't use this instruction on the IBM PC unless you really know what you are doing. This instruction is not equivalent to the x86 halt instruction. Do not use it to stop your programs.

The 80x86 provides another prefix instruction, lock, that, like the rep instruction, affects the following instruction. However, this instruction has little meaning on most PC systems. Its purpose is to coordinate systems that have multiple CPUs. As systems become available with multiple processors, this prefix may finally become valuable. You need not be too concerned about this here.

The Pentium provides two additional instructions of interest to real-mode DOS programmers. These instructions are cpuid and rdtsc. If you load eax with zero and execute the cpuid instruction, the Pentium (and later processors) will return the maximum value cpuid allows as a parameter in eax. For the Pentium, this value is one. If you load the eax register with one and execute the cpuid instruction, the Pentium will return CPU identification information in eax. Since this instruction is of little value until Intel produces several additional chips in the family, there is no need to consider it further, here.

The second Pentium instruction of interest is the rdtsc (read time stamp counter) instruction. The Pentium maintains a 64 bit counter that counts clock cycles starting at reset. The rdtsc instruction copies the current counter value into the edx:eax register pair. You can use this instruction to accurately time sequences of code.

Besides the instructions presented thus far, the 80286 and later processors provide a set of protected mode instructions. This text will not consider those protected most instructions that are useful only to those who are writing operating systems. You would not even use these instructions in your applications when running under a protected mode operating system like Windows, UNIX, or OS/2. These instructions are reserved for the individuals who write such operating systems and drivers for them.

6.9 - Program Flow Control Instructions
6.9.1 - Unconditional Jumps
6.9.2 - The CALL and RET Instructions
6.9.3 - The INT, INTO, BOUND, and IRET Instructions
6.9.4 - The Conditional Jump Instructions
6.9.5 - The JCXZ/JECXZ Instructions
6.9.6 - The LOOP Instruction
6.9.7 - The LOOPE/LOOPZ Instruction
6.9.8 - The LOOPNE/LOOPNZ Instruction
6.10 - Miscellaneous Instructions

Art of Assembly: Chapter Six - 26 SEP 1996

[Chapter Six][Previous] [Next] [Art of Assembly][Randall Hyde]